Skip to content

Make serve a read-only consumer of the on-disk cache#33

Merged
jdoss merged 1 commit intomasterfrom
fix/serve-cache-read-only
Apr 17, 2026
Merged

Make serve a read-only consumer of the on-disk cache#33
jdoss merged 1 commit intomasterfrom
fix/serve-cache-read-only

Conversation

@jdoss
Copy link
Copy Markdown
Contributor

@jdoss jdoss commented Apr 17, 2026

Summary

Serve and setup each hold their own in-memory dict of the on-disk cache. Without coordination, serve's cache.save() on a lookup miss overwrites setup's freshly pruned state with serve's older dict — resurrecting the stale entries PR #32's prune step just removed.

Observed on the test server immediately after deploying PR #32: cache grew back to 15k+ entries within minutes of setup pruning it down to ~500.

Fix: drop the cache.save() call on the serve cache-miss path. Values still populate the in-memory dict via cache.set(), so subsequent lookups for the same secret in the same process still hit the cache, and existing tests asserting on in-memory presence still pass. The disk file is now owned exclusively by setup, which prunes on every run.

Tradeoff

A secret lazily cached during serve runtime (e.g. an HSM-stored secret outside config.workloads) is lost on serve restart and will miss the cache on its next lookup. Acceptable — the cache's purpose is surviving provider outages for workload secrets, and those are always populated by setup.

Test plan

  • pytest tests/test_serve.py tests/test_serve_offline.py tests/test_setup.py — all 38 tests pass, including test_cache_miss_falls_through_to_provider which asserts the in-memory cache still gets populated after a miss.
  • ruff check / ty check — clean.
  • Deploy to test server, run setup to prune, observe cache size stays at the real secret count instead of growing back to 15k+.

Serve and setup each hold their own in-memory dict of the on-disk
cache. Without coordination, serve's cache.save() on a lookup miss
overwrites setup's freshly pruned state with serve's older dict —
resurrecting the stale entries the prune step in PR #32 just removed.
Observed on the test server: cache grew back to 15k+ entries within
minutes of setup pruning it down to ~500.

Drop the cache.save() call on the serve cache-miss path. Values still
populate the in-memory dict (cache.set), so subsequent lookups for the
same secret in the same process still hit, and tests asserting on
in-memory presence still pass. The disk file is owned exclusively by
setup, which prunes on every run.

Tradeoff: a secret lazily cached during serve runtime (e.g. an
HSM-stored secret outside config.workloads) is lost on serve restart
and will miss the cache on its next lookup. Acceptable — the cache's
purpose is surviving provider outages for workload secrets, and those
are always populated by setup.
@jdoss jdoss merged commit be56aa3 into master Apr 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant